keywords: Clustering algorithm, unstructured dataset, classification, descriptive mining, data dictionary
The grouping of large unstructured dataset is one of the main tasks in cluster analysis. A dataset is unstructured if it has a muddle of data types whose pattern makes it uneasy to search or partition. Unstructured dataset is difficult to classify because it does not have a defined schema. An Enhanced Descriptive Mining Algorithm (EDMA) proposed in this study was used to group the given instances in the input space into a number of clusters. The aim of this study is to partition and analyse a given unstructured dataset to its constituent’s distinct features. In order to achieve this central objective, the proposed EDMA is implemented along with the data dictionary created within the program to support the analysis; the implementation was carried out using java programming language. The unstructured dataset taken as input was retrieved from an open repository and comprised of numeric, alphabetic and some special characters. The resulting output of this study shows a well clustered data that is partitioned according to their similarity features. Based on a number of metrics, the performance of the proposed technique is determined by evaluating its effectiveness in relation to some existing techniques: k-means and EM clustering techniques. Findings from this study showed that, the proposed technique is reliable, accurate, and very suitable for the clustering of unstructured dataset.